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Abstract 

Answering in a strong form a question posed by Bollobas and Scott, in this paper we determine 
the discrepancy between two random /c-uniform hypergraphs, up to a constant factor depending 
solely on k. 

1 Introduction 

A hypergraph H is an ordered pair H = (V,E), where V is a finite set (the vertex set), and E is a 
family of distinct subsets of V (the edge set). The hypergraph H is /c-uniform if all its edges are of size 
k. In this paper we consider only /c-uniform hypergraphs. The edge density of a /c-uniform hypergraph 
H with n vertices is pu = e (-^)/(fc)- We define the discrepancy of H to be 



d\sc(H) = max 

SCV(H) 



(1) 



where e(S) = e(H[S]) is the number of edges in the sub-hypergraph induced by S. The discrepancy 
can be viewed as a measure of how uniformly the edges of H are distributed among the vertices. This 
important concept appears naturally in various branches of combinatorics and has been studied by 
many researchers in recent years. The discrepancy is closely related to the theory of quasi-random 
graphs (see [6]), as the property disc(G) = o(|y(G7)| 2 ) implies the quasi-randomness of the graph 67. 

Erdos and Spencer [8] proved that for k > 2, any /c-uniform hypergraph H with n vertices has a 
subset S satisfying e(S) — ^('f') > cn~5~ , which implies the bound disc(-ff) > cn~5~ for /c-uniform 

hypergraphs H of edge density i. Erdos, Goldberg, Pach and Spencer [7] obtained a similar lower 
bound for graphs of edge density smaller than ^. These results were later generalized by Bollobas and 
Scott in [3], who proved the inequality disc(il) > C] i \frn~^~ for /c-uniform hypergraphs H, whenever 
r = Ph(^ — Ph) > 1/n. The random hypergraphs show that all the aforementioned lower bounds are 
optimal up to constant factors. For more discussion and general accounts of discrepancy, we refer the 
interested reader to Beck and Sos [2] , Bollobas and Scott [3] , Chazelle [5] , Matousek [10] and Sos [TT] . 

A similar notion is the relative discrepancy of two hypergraphs. Let G and H be two /c-uniform 
hypergraphs over the same vertex set V, with |V| = n. For a bijection tt : V — > V, let G^ be obtained 
from G by permuting all edges according to n, i.e., £ , (G 7r ) = ir(E(G)). The overlap of G and H 
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with respect to it, denoted by G n n H, is a hypergraph with the same vertex set V and with edge set 
E(G n ) n E(H). The discrepancy of G with respect to H is 

disc(G, H) = max 

7T 

where the maximum is taken over all bijections ir : V — > V. For random bijections 7r, the expected size 
of E(G n ) n E(H) is PGPHul), thus disc(G, i?) measures how much the overlap can deviate from its 
average. In a certain sense, the definition ([2]) is more general than (PQ), because one can write disc(-ff) = 
maxi<j< n disc(Gj, H), where Gi is obtained from the complete i-vertex /c-uniform hypergraph by 
adding n — i isolated vertices. 

Bollobas and Scott introduced the notion of relative discrepancy in [4] and showed that for any 
two n-vertex graphs G and H, if ^ < Pg,Ph < 1 — then disc(G, H) > c ■ f(pc,PH) ■ "- 5 , where c 
is an absolute constant and f(x,y) = x 2 (l — x) 2 y 2 (l — y) 2 . As a corollary, they proved a conjecture 
in [7] regarding the bipartite discrepancy disc(G, K\ »j ran). Moreover, they also conjectured that 
a similar bound holds for fc-uniform hypergraphs, namely, there exists c = c(k, pc, pa) for which 
disc(G, -ff) > cn~5~ holds for any /c-uniform hypergraphs G and -ff satisfying — < pg,Ph < 1 — ^> 

In their paper, Bollobas and Scott also asked the following question (see Problem 12 in Given 
two random n-vertex graphs G, H with constant edge probability p, what is the expected value of 
disc(G, H)l In this paper, we solve this question completely for general /c-uniform hypergraphs. Let 
T-Lk(n,p) denote the random /c-uniform hypergraph on n vertices, in which every edge is included 
independently with probability p. We say that an event happens with high probability, or w.h.p. for 
brevity, if it happens with probability at least 1 — n~ w ^ n \ where here and later w(n) > denotes an 
arbitrary function tending to infinity together with n. 

Theorem 1.1. For positive integers n and k, let N = (Tli ) • ^ an d H ^ e ^ wo random hypergraphs 
distributed according to rlk(n,p) and Hk(n,q) respectively, where < p < q < |. 

(1) dense case - If pqN > ^jlogn, then w.h.p. disc(G,H) = 6fc (\Jpq (^)nlog nj ; 

(2) sparse case - If pqN < -^jlogn, let 7 = then 

(2.1) ifpN > then w.h.p. disc(G,tf) = G fc (^) . 

(2.2) ifpN < jggL then w.h.p. disc(G,H) = Q k (p(J)). 

The previous theorem also provides tight bounds when p and/or q > ^, as we shall see in the 
concluding remarks. The result of Theorem 11.11 in the sparse range is closely related to the recent 
work of the third author with Lee and Loh [9j. Among other results, the authors of [9] show that 
two independent copies G, H of the random graph G(n,p) with p -C ytogre/ra w.h.p. have overlap of 
order ^{^jp^j where 7 = ^f^- Hence disc(G, H) = holds, since in this range of edge 

probability, is larger than the average overlap p 2 (^)- Our proof in the sparse case borrows some 

ideas from [9]. On the other hand, one can not use their approach for all cases, hence some new ideas 
were needed to prove Theorem II .li 

It will become evident from our proof that the problem of determining the discrepancy can be 
essentially reduced to the following question. Let K > and let X be a binomial random variable with 



e{G K f\H)- p G pH 



(2) 
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parameters m and p. What is the maximum value of A = A(m, p, K) satisfying P[X— mp > A] > e~ K ! 
This question is related to the rate function of binomial distribution. In all cases, the discrepancy in 
the statement of Theorem 11.11 is w.h.p. 

disc(G, H) = @ k (n ■ A (p ~ J Y q, log n) V (3) 

Note that pC?~i) i s roughly the size of the neighborhood of a vertex in the hypergraph G. 

The rest of this paper is organized as follows. Section [2] contains a list of inequalities and technical 
lemmas used throughout the paper. In section [31 we define the probabilistic discrepancy discp(G, H) 
and prove that w.h.p. it does not deviate too much from disc(G, H). Additionally, we establish the 
upper bounds for disc(G, H) based on analogous bounds for discp(G, H). In section HI we give a 
detailed proof of the lower bounds. The final section contains some concluding remarks and open 
problems. In this paper, the function log refers to the natural logarithm and all asymptotic notation 
symbols (O, O, o and O) are with respect to the variable n. Furthermore, the fc-subscripts in these 
symbols indicate the dependence on k in the relevant constants. 



2 Auxiliary results 

In this section we list and prove some useful concentration inequalities about the binomial and hyper- 
geometric distributions and also prove a corollary from the well-known Vizing's Theorem which asserts 
the existence of a linear-size matching in nearly regular graphs (i.e., the maximum degree is close to 
the average degree). We will not attempt to optimize our constants, preferring rather to choose values 
which provide a simpler presentation. Let us start with classical Chernoff-type estimates for the tail 
of the binomial distribution (see, e.g., pQ). 

Lemma 2.1. Let X = Y2i=i X% ^ e ^ e sum of independent zero-one random variables with average 
p = E[X]. Then for all non-negative A < p, we have ¥\\X — p\ > A] < 2e 4 ^ . 

The following lower tail inequality (see [I]) is due to Janson. 

Lemma 2.2. Let Ai,A%,...,Ai be subsets of a finite set £1, and let R be a random subset of for 
which the events r G R are mutually independent over r € 0. Define Xj to be the indicator random 
variable of Aj C R. Let X = Yl\=i Xj, P = and A = }Zi~j " Xj], where i ~ j means that 

Xi and Xj are dependent (i.e., Ai intersects Aj). Then for any A > ; 

A 2 

F[X < p - A] < e 2m+a . 

In the proof of the dense case of the main theorem we will need a lower bound for the tail of the 
hypergeometric distribution. To prove it we use the following well-known estimates for the binomial 
coefficient. 

Proposition 2.3. Let H(p) = plogp+ (1 — p) log(l — p), then for any integer m > and real p E (0,1) 
satisfying pm £ Z we have 

e A \pm J 2tt 
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Proof. This can be derived from Stirling's formula \/2vrm (™) < ml < ey/rn (^r)™ • Q 

Lemma 2.4. Lei di, cfo, A and N be integers and K be a real parameter such that 1 < d\,d 2 < ^j-, 
1 < K < and A = J^jfi. Then 

(di\ (N-di\ 

E \tJ\ dg-t J > -40K 

( N ) 

Proof. For convenience, we write f(t) = / (^ 2 )- In order to show the desired lower bound of 

the hypergeometric sum, it suffices to prove that 

4 -40K 

fit) > 



did 2 i A 
N "•" ^ 



for every integer i = + #A with 1 < 6 < 2. Indeed, to see this, note that there are at least 
[AJ > A integers between ^ + A and ^ + 2A and 



2 V - 2 V N 

Next we prove the bound for f(t). For our choice of A, the inequality A < ^ is true since 



_ did 2 K /tfe ^£ < , /^2 <^2 _ ^1 ^2 < ^1 

V iV ~ 1 \jN'd 1 - X \ N ' lOOiV - 10 ' N ~ 15' 

Similarly A < f§. Let a; = y = ^ and z = J^. Then i = &ndd 2 -t = (x-z)(JV-di). 

But < x + y < 1, because < x < | and < y < ^A < |. Furthermore, < x — z < 1, because 
2 _ 0AiV < < | and x < |. By Proposition E3 we have 



d 2 (N-d 1 ) — d 2 — 5 ^ — 3' 



v(as+y)dJV(x-«)(iV-(ii)^ ^ 47T* 



/(*) = Tiv^ >-^VRe , 

where L = d\ ■ H(x + y) + (N - di) ■ H(x - z) - N ■ H(x) and 

fl= x(l-x)N > 1 > 1 1 

(x - z)(l - x + z)(x + y)(l -x- y)d l {N - d-y) ~ (x + y)d x ~ 2 ^ + A' 

Here we used the inequality 6 < 2 and the identity {x + y)d\ = t = + #A. Because diy 
(iV — c?i)z = 6A and log(l + s) < s, we obtain 
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L = di 



x + y) log ( 1 + V- ) + (1 - x - y) log I 1 



+ (N-d l ) 







(x — z) log 


(!-;) 







< di 



(x + y)y _ (1- x- y)y 
x 1 — x 



+ {N- 



(x — z)z (1 — x + 2)2 



0A • (y + z) ■ ( - + ' 



Thus we always have f(t) > 



4nf_ 

e 5 



1 — X 

-36.K" 



6> 2 A 2 iV 3 



di(JV-di)d2(JV-d2 



< 36K. 



' d 1 d2 
N 



+A 



> 4e 4 ° y . completing the proof. 



' d 1 d 2 



□ 



+A 



The next lemma will be used to prove the lower bound in the sparse case of Theorem ll.il and was 
inspired by an analogous result in [9]. 

Lemma 2.5. For positive integers n and k, let N = ( n k Jf), —jp- < p < q < | suppose that 
pqN < i logn. Define 7 = £et Ni,...,N s CB be s > ra 1 / 3 disjoint sets of size (1 + o(l))iVp ; 

and consider the random set B q , obtained by taking each element of B independently with probability 
q. Then w.h.p., there is an index i for which 



(l) \B q nNi\ > 



log ft 



i/piV > 



6 log 7 

(2) Ni QB q ifpN<^. 



log n 
5 log 7 ' 



Proof. Let i = g 1 "^^ . Clearly 1 — q > e 3?//2 when q < 1/2. For a fixed index i, the probability 
that \B q (1 Ni\ > t is at least - g) |7Vl1 " 4 . Using the bounds (£) > (f ) 6 for a > 6, and 

^ log n > Npq 



log 71 



30 



, we obtain 



Ni 



(l + o(l))Npq y c _ 2pqN > ( b log 7 



7 



log re 
6 log 7 



-1/15 



0.3 



Hence the expected number of indices i such that \B q f] Ni\ > t is at least sn~ ' 3 > n 1 / 30 . Since the 
sets Ni are disjoint, these events are independent for different choices of i. Therefore by Lemma 12.11 
w.h.p. we can find such an index (actually many). 
If pN < ipL then q = *2S» > 



5 log 7 ' ^ 7pW — 7 

g |JVi| > 7 -(l + 0(l)JV P > 7 -S 

and we can complete the proof as in the first case. 



> 7 . Therefore the probability that some Ni C B q is 

log n 



n 



-1/4 



□ 



The last lemma in this section, which can be easily derived from Vizing's Theorem, will be used 
to find a linear-size matching in nearly regular graphs. 

Lemma 2.6. Every graph G with maximum degree A(G) ; contains a matching of size at least • 
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Proof. By Vizing's Theorem, the graph G has a proper edge coloring / : E(G) —> {1, 2, . . . , A(G) + 1}. 
For each color 1 < c < A(G) + 1, the edges / _1 (c) form a matching in G. By the pigeonhole principle, 
there is a color c such that / _1 (c) has at least a(g)+i edges. □ 



3 Upper bounds 



In this section we prove the upper bound for the discrepancy in Theorem 11.11 Let G and H be 
two random hypergraphs over the same vertex set V, distributed according to T~Lk{n,p) and %j t {n,q), 
respectively. The probabilistic discrepancy of G and H is defined by 



discp(G, H) = max 



e(G n n H) - pq 



where the maximum is taken over all bijections n : V — > V . We will show that w.h.p. the difference 
between disc(G, H) and discp(G, H) is very small. Before we proceed, we state the following fact 
whose proof is fairly trivial. 

Proposition 3.1. If\AB — AqBq\ > t\t2 + | A)|e2 + |A)|ei, then either \A — Aq\ > e\ or \B — Bq\ > 62- 

Lemma 3.2. With probability at least 1 — 4e _v/ ™ ; the inequality |disc(G, H) — discp(G, H)\ < 2e holds, 
where e = An±yjpq(fy . 



Proof. Since p(^) = Q(n), applying Lemma \2~T\ to the random variable e(G) for A = 2ni \Jp{^) < p(^) 
yields 



e(G)-p 



< 2n 4 \ p 



> 1 - 2e~ 



> 1 - 2e~ V ™. 



Therefore, with probability at least 



Similarly, we have P \e{H) - q(£)\ < 2ni W q( n k 

1 — 4e _v/ ™, \po — p\ < 2ni (^p/ (2)) 1 ^ 2 and \pjj — q\ < 2n* (?/(?)) 1 • These inequalities, together with 
Proposition 13.11 imply 



11 



pq 



n 



completing the proof of the lemma. 



< Ayjpqn + 2pn 4 \ q 



n\ 1 
k +2qnI 



n 



< 2e, 



□ 



It is easy to check that the error term e is much smaller than the bounds in Theorem II .11 There- 
fore, in order to prove Theorem 11.11 for disc(G, H), it suffices to prove the corresponding bounds for 
discp(G, H) instead. 

Lemma 3.3. Let G and H be as in Theorem \l.l[ Then with high probability discp(G, H) satisfies the 
stated upper bounds of this theorem. 

Proof. Since the number of edges of G is distributed binomially and p(^) = fi(re), by Lemma 12.11 
we have e(G) < 2p(?) with probability at least 1 — e~ ( n ). Since discp(G, H) is bounded by 
max {e(G),pq(^) }, this implies the assertion in the case (2.2) of Theorem ll.il 
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For any fixed bijection n : V — > V, the number of edges in G n n H is distributed binomially with 
parameters (^) and pq. If pq(^) > 4nlogn let A = 2ypq (^) nlogn < pq(*£)- Then by Lemma \2A\ the 
probability that | e(G w D il) — p^(^) | > A is at most 2e~ nlog ™. On the other hand, ifpg(?) <4nlogn, 
let V = > e and A = > = epg (n). Since («) < ( f )^ the probability that 

e(G,r n Lf ) > A is at most 

Since there are n! possible bijections 7r : V — >• V, by the union bound 



P [disc P (G, H) > A ] < n! • 2 e - nlogn < e~ n/2 . 

To finish the proof of the lemma note that 7, defined in Theorem II .14 satisfies 7 = 8^(7'). Also 
observe that for p, q satisfying both pq(£) < 4nlogn and pqN > ^ logn, where N = \ k _jf), we have 




4 Lower bounds 

In this section we prove the lower bounds in Theorem 11.11 As we previously explained, it is enough 
to obtain these bounds for discp(G, H). We divide the proof into two cases. The first (dense case) 
will be discussed in the next subsection. The second (sparse case) will be discussed in subsection 14.21 
Throughout the proofs, we assume that k is fixed and n is tending to infinity. 



4.1 Dense Case 

Let = (Tli) an ^ P> 1 ^ e sucn th & t PQN > log n. Select an arbitrary set L C V of size \L\ 
We prove that w.h.p. there exists an L-bijection ir : V — > V with overlap 



e(G n C\H) >Pq(^j +©fc (n- y/pqN\ognj = pq(^j + G k (J pq(f\nlogn\ , (4) 



where an L-bijection ir : V — > V is a bijection from V to V which only permutes the elements of L, 
i.e., 7t(x) = x for all x L. 

From the random hypergraph G we construct a random bipartite graph G with vertex set Lq U R, 
where Lq = L and R is the set of all (k — l)-tuples in V\L. Note that \R\ = N. The vertices v\ G Lg- 
and {-W2, ^3, . . . , Ufe} G i? are adjacent if {vi, V2, ■ ■ ■ , Vfe} forms an edge in the hypergraph G. With 
slight abuse of notation, we view G as a sub-hypergraph of G, containing all edges e having exactly 
one vertex in L, i.e. |e D L| = 1. Similarly, from the random hypergraph H we construct a random 
bipartite graph H with vertex set Lh U -R. Figure [1] shows the resulting bipartite graphs. 

Given an L-bijection tt : V — > V, we divide the edge set of G n n Lf into two subsets: the edge set of 
G-xPiH and its complement. To prove our result we first expose the random edges in G and H , and show 
how to find an L-bijection it having overlap at least 0^ (n • \/pqN log n) more than the expectation. 
Then we fix such tt and expose all the remaining edges in G and H showing that the contribution 
of these edges to G v n H does not deviate much from the expected contribution. More precisely, let 



7 



R 




e w = \E{(G - G) w ) n E(H - H)\, then e(G w n H ) = e(G 7r n H) + e T . Moreover, e w is distributed 
according to J5in(m,pq), where \ <m = (^) — N% < (^). Thus w.h.p. | e^i- — pqm\ < ^Jpqm • log n, 



as Lemma [2.11 shows . Since ^Jpqm • log n <C y pq(^jnlogn, in order to obtain @, it is enough to show 
that w.h.p. there exists an L-bijection ir such that 



e(G v n H) > ^ ■ (pqN + e k (ypqN hgn^j . (5) 

We define an auxiliary bipartite graph T = Lf ) as follows. A vertex u G Lg survives if 
| degg(ti) — piV| < 2^2pN and similarly, a vertex t> G L# survives if | deg^(t>) — giV| < 2y/2qW. Let 
5g and Sh be the sets of all surviving vertices of G and H, respectively. Let sq = \Sg\ and sh = \Sh\- 
The set of vertices of T is the union of Sq and Sh- The edges of V are defined by the property 

deg^(u) degf^(f ) _ 2 , 

u ~r v <J=^ codeg(ti,f) > - — — — h 10 y pqNlogn, 

where codeg(u, v) denotes the codegree of u £ Lq and v G L#, i.e. codeg(u,v) = \N^(u) n iVg-(v)|. 
The graph T has many vertices in both parts, as the following simple lemma demonstrates 

Lemma 4.1. W.h.p. each part ofT has size at least 4jk 

Proof. Let a be the probability that some vertex u survives in Lq. Since pN > w(n) > 8, we have 
that 2yJ2pN < pN. Thus Lemma l2~T1 applied to degg(u) implies a > 1 — 2e~ 2 > 1/2. Since the 
events that vertices survive are independent, sq stochastically dominates the binomial distribution 
with parameters n/k and 1/2. Thus, again by Lemma 12. 11 w.h.p. sq > n/(4fe) and a similar estimate 
holds for sh- □ 

To prove ([5]), we will show that the following two statements hold w.h.p. 

(a) T has a matching M = {(u\, v\), . . . , (ui,vi)} of size I = 

(b) there exists an L-bijection ir such that vr(iij) = v j for all i = 1, 2, . . . , I, and, 

^ codeg(u, tt(u)) >(?-*) P?^ - ^VmN- 

u£Lg\{ui,U2,...,ui} 
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Indeed, for any two adjacent vertices u, v in T, we have 

degg(n) deg H (v) ( P N - ^/8pN)(qN - V&[N) 



N 



N 



> pqN - Q^fpqN. 



Thus using (a), (b) and I = -^r we obtain 



I 



e(G w n H) = Y codeg{u, tt(u)) > Y codeg(n i , ^) + ( - - I) pqN - 2-^/p~qN 

uGLq 1=1 
I 



2 



deggOi) deg-(ui) 



N 



10 2 y / pqN log 



n 



+ (?-*) pqN-2^^p~q~N 



n 



> 



y~] jpgiV — 6^ pqN 



n 



+ — W" 2 ^ pqN log n + (~ - I) pqN - 2- • y^iV 



> - (pgiV + VpgiVlogn 
We need the following lemma in order to prove that (b) holds. 

71 

Lemma 4.2. Let < a < 1 oe any absolute constant. Then with probability at least 1 — e~ fc , any too 
subsets A C and C L# toit/i |^4| = |U| = ^ satisfy 

Xa,b-= Y codeg(u,v) > {^y) PQN -2a (J^j ^pqN. 



Proof. Let be the indicator of to G ^(G) and to G E(H) for w £ R,u £ A,v £ B. So 

^a,b = E w ei?,«eA,t;6B and = Moreover, X w ^ v and X^/y are dependent if and 

only if wu = w'u' or to = w'v' . Thus, /x = EfX^s] = (^r) 2 Npq and 

A= Yl Yl E l X ™,u,vXw,u,v'}+ Y ®[Xw,u,v X Wi<v ] = — ( * )Npq(p + q), 

w£R,u£A v,v'eB w£R,v£B u,u'gA ^ ' 

where [i and A are defined as in Lemma [2.21 Let F be the event that there exists at least one pair of 
subsets A C Lq, B C Lh with \A\ = \B\ = ^f satisfying X A)B < (^) 2 Npq - 2a(%) 2 ^Npq~. By the 
union bound and by Lemma 12.21 we have 



F[F] < Y 

A&( l g) b&( l h) 

\ an J ' \ an J 



n \2 

X A ,B < V ~ 2a ( -J y/Npq 



n\2 ( 2a (» ) 2 V7^) 2 



< ' an I e 



2/j+A 



( 2qre 
-J e d fe < e fe , 

since 2/i + A < | (^fr) 3 Npq, a < 1 and alog(e/a) < 1 for all such a. 



□ 



Let M = {(u\,vi), . . . , (ui,Vi)} be a matching satisfying (a) and let A = Lq \ {ui, U2, ■ ■ ■ , u{\ 
and B = Lh \ {t>i, t>2, • • • , vi}. One can write \A\ = \B\ = ^ — I = where a = |i. Consider 
-X"a,£ = Z^ueA^eB codeg(n, f). Then, by Lemma |4T2| with probability at least 1 — e~fc, we have 



Y codeg(n, u) > - l) pqN - 2^ - /) 
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Since the complete bipartite graph with parts A, B is a disjoint union of £ — I perfect matchings, by 
the pigeonhole principle, there exists a matching M' between A and B such that 

2^ codeg(u, v) > ^— > ^- - / J pqN - — y/pqN. 

{u,v)eM' k 

Then the matching M U M' between Lq and Lh gives the desired L-bijection tt and proves (b). 

To finish the proof we need to establish (a). If T is nearly regular, then by Lemma 12.61 T would 
contain a linear-size matching. Unfortunately this is not the case. However, we will show that it is 
possible to delete some edges of V at random and obtain a pruned graph V' , which is nearly regular. 
Let 

f(d 1 ,d 2 ) := P [u ~ r v\ degg(n) = di,degg(v) = d 2 ] , 



where \d% — pN\ < 2\J2pN and \d 2 — qN\ < 2^J2qN . Let fo be the minimum of f(d\,d 2 ) over all pairs 
(di,d 2 ) in the domain of /. Suppose that fo > n _ 2 ; which we shall prove later. We keep each edge 
uv of T in r' independently with probability jr^dS) > where d\ = degg(-u) and d 2 = deg H (v). Then, 
we claim that for any vertex u G Sq, deg r /(u) is binomially distributed with parameters sh and fo. 
Indeed, by definition, P [u ~r' v\ degg(u) = di,deg H (v) = d 2 ] = fo for all possible d\,d 2 . Moreover, 
conditioning on the neighbors of u in G and on the values of the degrees deg^(-ui), deg^i^), • • • , 
deg^(f m ), the events u ~r v\,u ~r v 2l . . . , and u ~r v m are all independent. Therefore, by definition 
of T', it is easy to see that u ~r ; vi, u ~r' v 2 , . . ., and u ~r' v m are independent as well. Thus for 
any u G Sq, deg r /(u) ~ Bin(s#,/o) and similarly, deg r ,(v) ~ Bm(sc, fo) f° r all v G Sh- 

Conditioning on the degrees of all vertices in G, H, we obtain sets Sg and Sh, which w.h.p. satisfy 
the assertion of Lemma ELTI i.e., \Sg\ = sq > jj; and \Sh\ = sh > il- Thus both sg/o and s#/o are 
Since all degrees in T' are binomially distributed, Lemma [4.11 together with the union bound 
imply that w.h.p. all vertices u G Sg,v G Sh satisfy 

— ^— < deg r ,(u) < — ^— and — ^— < deg r ,(v) < — ^— . 
Therefore, the max-degree A(T') < max { , 25a/o J < Msl and e ( r ') > £g^/o > gb_ Thug by 
Lemma |2"U| T' has a matching of size at least — 5Sfc> com pl e ting the proof of (a). 

log n 
5000 

: ^3 

Also recall that pqN > 4n logn, which implies 



It remains to prove the bound f > n - a . Let K = > 1. Since tends to infinity, p < g < 1/2 
and |di-pJV| < 2^/2p~N, we have 1 < d l = (l + o{l))pN < . Similarly 1 < d 2 = (l + o(l))qN < 



^ (1 + o(l))^> + >K. 



WON v v " 100 ~ v v "3000 



Therefore we can apply Lemma El with A = J > Vpg ^ ogn . By the definition of f(d u d 2 ), we 
have 

di\ (N-d!\ (di\ (N-di\ 

t J \ d 2 -t I v. u/ \ da-< / 



- 40i ^>n-i 



t - — iv 1 loo - 77 + 

This completes the proof. □ 
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4.2 Sparse case 

In this subsection, we prove the lower bound in the sparse case pqN < ^ logn. Note that, since p < q 
in this case, we have p < N^ l ^ 2+ °^ . The proof runs along the same lines as that of the dense case 
differing only in the application of Lemma 12.51 to obtain an L-bijection 7T : V — > V whose sum of 
codegrees ^ ug ^ G codeg(n, vr(n)) is large. Suppose first that pN > ■ Recall that 7 = > 30 
and thus J"^, > 4 2°og 7 + "^p = 42°iog 7 + pqN. Therefore it is enough to find a bijection tt between 
L G and L H such that E«eL G codeg(-u, vr(u)) > (1 + o(l))f • ^j^. 

Partition the vertices of Lq into r = ^- disjoint sets Si,...,S r each of size s = n 2 / 5 . We will 
construct tt by applying the following greedy algorithm to each set. Let us start with Si. The 
algorithm will reveal the edges emanating from Si to R in G by repeatedly exposing the neighborhood 
of a vertex in Si, one at a time. Throughout this process, we construct a subset S[ C Si of size 
(1 — o(l))|£i| and a family of disjoint sets N u C R, such that each N u has size (1 + o(l))Np and is 
contained in the neighborhood of u, for all u £ SJ. At each step, we pick a fresh vertex u in Si and 
expose its neighborhood. If u has a set of (1 + o(l))Np neighbors which is disjoint from N w for all 
w in the current S[, denote this particular set by N u and put u in the set S[; otherwise move to the 
next step. At every step, the union X = U^gj^ has size at most 0(pN ■ s) < A r °' 9+ °( 1 \ Moreover, 
every vertex in R \ X is adjacent to u independently with probability p. Since pN > w(n) tends to 
infinity with n, the set of neighbors of u outside X has size (1 + o(l))\R \ X\p = (1 + o(l))Np with 
probability 1 — o(l). Furthermore, for different vertices such events are independent. Therefore, by 
Lemma 12. 11 w.h.p. \S[\ = (1 — o(l))|Si|. Now we will construct the partial matching for Si. Consider 
the disjoint sets N u , for u G S[, each of size (l + o(l))Np. Pick an arbitrary vertex v in Lh and expose 
its neighbors in H. This is a random subset N v of R, obtained by taking each element independently 
with probability q. Therefore by case (1) of Lemma 12.51 w.h.p there is a vertex u E S[ such that 
codeg(u, v) > \N u f]N v \ > q°^^ - Define tt(u) = v, remove u from S[, remove v from Lh and continue. 
Note that, as long as there are at least n 1 / 3 vertices remaining in S[, we can match one of them with a 
newly exposed vertex from Lh such that the codegree of this pair is at least ■ Once the number 
of vertices in S[ drops below n 1//3 , leave the remaining vertices unmatched. W.h.p. we can match a 
1 — o(l) fraction of the vertices in Si. 

Continue the above procedure for S2,...,S r as well. At the end of the process, we will have 
matched a 1 — o(l) fraction of all the vertices in Lq with distinct vertices in Lh such that codegree of 
every matched pair is at least ■ Therefore the sum of the codegrees of this partial matching is at 
least (1 + o(l))^ • g 1 "^, ■ To obtain the bijection n, one can match the remaining vertices in Lq and 
Lh arbitrarily. 

When pN < the same proof as above together with case (2) of Lemma 12.51 yields a bijection 

7r such that YIu&Lq codeg(?x, ir(u)) > (1 + o(l))f ■ pN . Since q < \, this is at least (| + o(l)) | ■ pN 
more than the expectation, finishing the analysis of the sparse case. □ 

5 Concluding remarks 

As we stated in the introduction, Theorem 1 1 . 1 1 also yields tight bounds when p and/or q> \. For any 
G and H, one can check that disc(G, H) = disc(G, H), where H is the complement of H. Moreover, H 
is distributed according to 7ik{n, 1 — q), hence we can reduce the case q > \ to the case q' = 1 — q < |; 



11 



the same holds when we take the complement of G instead. We remark that one can determine the 
discrepancy when p is smaller than ^jp-, but we chose not to discuss this range here, since the proof 
is similar to the sparse case and it wouldn't provide any new insight. 

The definition of discrepancy can be rephrased as disc(G,H) = max {disc + (G, H), disc - (G, H)}, 
where disc + (G, H) = max,,- e{G K fl H) — pgPh^) and disc~(G, H) = pcPni^) ~ m i n vr D H) are 
the one-sided relative discrepancies. In fact, all the lower bounds we obtained are for disc + (G, H), and 
some of them are not true for disc - (G, H) . This is because disc~(G, H) < pgPh{^) — PQ{1) an d m 
the sparse case, pq\T\ could be much smaller than disc(G, H). Under the same hypothesis and using 
similar ideas as in Theoremll.il one can show that 



disc - (G, H) 



©fc (yP9(fe)«lognJ \i pqN > ±\ogn; 

©fe (pq(D) otherwise. 

The last equation is related to the lower tail of the binomial distribution. 

Lastly, we would like to mention that there are a substantial number of open problems about 
disc(G, H) and its related topics in [I]. 
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